Goto

Collaborating Authors

 Danish Sector


DNOI-4DRO: Deep 4D Radar Odometry with Differentiable Neural-Optimization Iterations

Lu, Shouyi, Zhou, Huanyu, Zhuo, Guirong, Tang, Xiao

arXiv.org Artificial Intelligence

A novel learning-optimization-combined 4D radar odometry model, named DNOI-4DRO, is proposed in this paper. The proposed model seamlessly integrates traditional geometric optimization with end-to-end neural network training, leveraging an innovative differentiable neural-optimization iteration operator. In this framework, point-wise motion flow is first estimated using a neural network, followed by the construction of a cost function based on the relationship between point motion and pose in 3D space. The radar pose is then refined using Gauss-Newton updates. Additionally, we design a dual-stream 4D radar backbone that integrates multi-scale geometric features and clustering-based class-aware features to enhance the representation of sparse 4D radar point clouds. Extensive experiments on the VoD and Snail-Radar datasets demonstrate the superior performance of our model, which outperforms recent classical and learning-based approaches. Notably, our method even achieves results comparable to A-LOAM with mapping optimization using LiDAR point clouds as input. Our models and code will be publicly released.


Improved Regret Bounds for Gaussian Process Upper Confidence Bound in Bayesian Optimization

Iwazaki, Shogo

arXiv.org Machine Learning

This paper addresses the Bayesian optimization problem (also referred to as the Bayesian setting of the Gaussian process bandit), where the learner seeks to minimize the regret under a function drawn from a known Gaussian process (GP). Under a Matérn kernel with a certain degree of smoothness, we show that the Gaussian process upper confidence bound (GP-UCB) algorithm achieves $\tilde{O}(\sqrt{T})$ cumulative regret with high probability. Furthermore, our analysis yields $O(\sqrt{T \ln^2 T})$ regret under a squared exponential kernel. These results fill the gap between the existing regret upper bound for GP-UCB and the best-known bound provided by Scarlett (2018). The key idea in our proof is to capture the concentration behavior of the input sequence realized by GP-UCB, enabling a more refined analysis of the GP's information gain.


DLPO: Towards a Robust, Efficient, and Generalizable Prompt Optimization Framework from a Deep-Learning Perspective

Peng, Dengyun, Zhou, Yuhang, Chen, Qiguang, Liu, Jinhao, Chen, Jingjing, Qin, Libo

arXiv.org Artificial Intelligence

Large Language Models (LLMs) have achieved remarkable success across diverse tasks, largely driven by well-designed prompts. However, crafting and selecting such prompts often requires considerable human effort, significantly limiting its scalability. To mitigate this, recent studies have explored automated prompt optimization as a promising solution. Despite these efforts, existing methods still face critical challenges in robustness, efficiency, and generalization. To systematically address these challenges, we first conduct an empirical analysis to identify the limitations of current reflection-based prompt optimization paradigm. Building on these insights, we propose 7 innovative approaches inspired by traditional deep learning paradigms for prompt optimization (DLPO), seamlessly integrating these concepts into text-based gradient optimization. Through these advancements, we progressively tackle the aforementioned challenges and validate our methods through extensive experimentation. We hope our study not only provides valuable guidance for future research but also offers a comprehensive understanding of the challenges and potential solutions in prompt optimization. Our code is available at https://github.com/sfasfaffa/DLPO.


Reviews: Regret bounds for meta Bayesian optimization with an unknown Gaussian process prior

Neural Information Processing Systems

The problem of hyperparameters tuning in GPs is indeed relevant. Unfortunately, I do not feel that the work is strong enough, in its current state, to provide clear enough results that other researchers could build upon. I am familiar with the analysis of GP-UCB and with Bayesian optimization. This paper addresses the problem of hyperparameters tuning in Bayesian optimization (e.g. with Gaussian processes). In practice, these hypers are often tuned by maximizing likelihood, at the expense of loosing theoretical guarantees.


Translating speech with just images

Oneata, Dan, Kamper, Herman

arXiv.org Artificial Intelligence

Visually grounded speech models link speech to images. We extend this connection by linking images to text via an existing image captioning system, and as a result gain the ability to map speech audio directly to text. This approach can be used for speech translation with just images by having the audio in a different language from the generated captions. We investigate such a system on a real low-resource language, Yor\`ub\'a, and propose a Yor\`ub\'a-to-English speech translation model that leverages pretrained components in order to be able to learn in the low-resource regime. To limit overfitting, we find that it is essential to use a decoding scheme that produces diverse image captions for training. Results show that the predicted translations capture the main semantics of the spoken audio, albeit in a simpler and shorter form.


Deploying Graph Neural Networks in Wireless Networks: A Link Stability Viewpoint

Li, Jun, Zhang, Weiwei, Wei, Kang, Chen, Guangji, Shi, Long, Chen, Wen

arXiv.org Artificial Intelligence

As an emerging artificial intelligence technology, graph neural networks (GNNs) have exhibited promising performance across a wide range of graph-related applications. However, information exchanges among neighbor nodes in GNN pose new challenges in the resource-constrained scenario, especially in wireless systems. In practical wireless systems, the communication links among nodes are usually unreliable due to wireless fading and receiver noise, consequently resulting in performance degradation of GNNs. To improve the learning performance of GNNs, we aim to maximize the number of long-term average (LTA) communication links by the optimized power control under energy consumption constraints. Using the Lyapunov optimization method, we first transform the intractable long-term problem into a deterministic problem in each time slot by converting the long-term energy constraints into the objective function. In spite of this non-convex combinatorial optimization problem, we address this problem via equivalently solving a sequence of convex feasibility problems together with a greedy based solver. Simulation results demonstrate the superiority of our proposed scheme over the baselines.


Enhancing Fault Detection for Large Language Models via Mutation-Based Confidence Smoothing

Hu, Qiang, Wen, Jin, Cordy, Maxime, Huang, Yuheng, Xie, Xiaofei, Ma, Lei

arXiv.org Artificial Intelligence

Large language models (LLMs) achieved great success in multiple application domains and attracted huge attention from different research communities recently. Unfortunately, even for the best LLM, there still exist many faults that LLM cannot correctly predict. Such faults will harm the usability of LLMs. How to quickly reveal them in LLMs is important, but challenging. The reasons are twofold, 1) the heavy labeling effort for preparing the test data, and 2) accessing closed-source LLMs such as GPT4 is money-required. To handle this problem, in the traditional deep learning testing field, test selection methods have been proposed for efficiently testing deep learning models by prioritizing faults. However, the usefulness of these methods on LLMs is unclear and under exploration. In this paper, we first study the effectiveness of existing fault detection methods for LLMs. Experimental results on four different tasks~(including both code tasks and natural language processing tasks) and four LLMs (e.g., LLaMA and GPT4) demonstrated that existing fault detection methods cannot perform well on LLMs (e.g., seven out of eight methods perform worse than random selection on LLaMA). To enhance existing fault detection methods, we propose MuCS, a prompt Mutation-based prediction Confidence Smoothing method for LLMs. Concretely, we mutate the prompts and compute the average prediction confidence of all mutants as the input of fault detection methods. The results show that our proposed solution significantly enhances existing methods with the improvement of test relative coverage by up to 97.64%.


Smooth Path Planning with Subharmonic Artificial Potential Field

Peng, Bo, Zhang, Lingke, Xiong, Rong

arXiv.org Artificial Intelligence

When a mobile robot plans its path in an environment with obstacles using Artificial Potential Field (APF) strategy, it may fall into the local minimum point and fail to reach the goal. Also, the derivatives of APF will explode close to obstacles causing poor planning performance. To solve the problems, exponential functions are used to modify potential fields' formulas. The potential functions can be subharmonic when the distance between the robot and obstacles is above a predefined threshold. Subharmonic functions do not have local minimum and the derivatives of exponential functions increase mildly when the robot is close to obstacles, thus eliminate the problems in theory. Circular sampling technique is used to keep the robot outside a danger distance to obstacles and support the construction of subharmonic functions. Through simulations, it is proven that mobile robots can bypass local minimum points and construct a smooth path to reach the goal successfully by the proposed methods.


Towards Revealing the Mystery behind Chain of Thought: A Theoretical Perspective

Feng, Guhao, Zhang, Bohang, Gu, Yuntian, Ye, Haotian, He, Di, Wang, Liwei

arXiv.org Machine Learning

Recent studies have discovered that Chain-of-Thought prompting (CoT) can dramatically improve the performance of Large Language Models (LLMs), particularly when dealing with complex tasks involving mathematics or reasoning. Despite the enormous empirical success, the underlying mechanisms behind CoT and how it unlocks the potential of LLMs remain elusive. In this paper, we take a first step towards theoretically answering these questions. Specifically, we examine the expressivity of LLMs with CoT in solving fundamental mathematical and decision-making problems. By using circuit complexity theory, we first give impossibility results showing that bounded-depth Transformers are unable to directly produce correct answers for basic arithmetic/equation tasks unless the model size grows super-polynomially with respect to the input length. In contrast, we then prove by construction that autoregressive Transformers of constant size suffice to solve both tasks by generating CoT derivations using a commonly used math language format. Moreover, we show LLMs with CoT can handle a general class of decision-making problems known as Dynamic Programming, thus justifying its power in tackling complex real-world tasks. Finally, an extensive set of experiments show that, while Transformers always fail to directly predict the answers, they can consistently learn to generate correct solutions step-by-step given sufficient CoT demonstrations.


Gaussian3Diff: 3D Gaussian Diffusion for 3D Full Head Synthesis and Editing

Lan, Yushi, Tan, Feitong, Qiu, Di, Xu, Qiangeng, Genova, Kyle, Huang, Zeng, Fanello, Sean, Pandey, Rohit, Funkhouser, Thomas, Loy, Chen Change, Zhang, Yinda

arXiv.org Artificial Intelligence

We present a novel framework for generating photorealistic Editing capabilities for 3D-aware GANs have also been 3D human head and subsequently manipulating achieved through latent space auto-decoding, altering a 2D and reposing them with remarkable flexibility. The proposed semantic segmentation [62, 63], or modifying the underlying approach leverages an implicit function representation geometry scaffold [64]. However, generation and editing of 3D human heads, employing 3D Gaussians anchored quality tends to be unstable and less diversified due to on a parametric face model. To enhance representational the inherent limitation of GANs, and detailed-level editing capabilities and encode spatial information, we is not well supported due to feature entanglement in the embed a lightweight tri-plane payload within each Gaussian compact latent space or tri-plane representations.